Imperfect DNA Repair and the Error Catastrophe 
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In this Letter, we extend the semiconservative quasispecies equations to incorporate imperfect 
DNA lesion repair. We study the equilibrium behavior of this model in the limit of infinite sequence 
length and population size, using a single-fitness-peak landscape for which the master genome can 
sustain a finite number of lesions and remain viable. We provide a full analytical treatment of the 
problem, providing a general mathematical framework as well as the full solution for a particular 
class of fitness landscapes. Stochastic simulations using finite sequence lengths and populations 
agree well with the analytical results. Applications to biological systems are briefly discussed. 
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The quasispecies model of genomic evolution has been 
used to study a number of problems in evolutionary dy- 
namics. The central result of the theory is the existence 
of an upper mutational threshold beyond which natural 
selection can no longer occur [ij. Below this threshold, 
a replicating population of genomes will eventually pro- 
duce, over many generations, a "cloud" of closely related 
genomes clustered about one or a few fast replicating 
genomes. These "clouds" are termed quasispecies, and 
are characteristic of the evolutionary dynamics of many 
viruses, such as HIV [lHli. 

Above the mutational threshold, natural selection can 
no longer act to localize the population about the fast 
replicating genomes, and delocalization occurs over the 
entire genome space. This localization to delocalization 
transition is known as the error catastrophe [l|, |5| , and it 
corresponds to the disappearance of any viable strains in 
the population. The error catastrophe has been observed 
experimentally |3jl3' ^^'^ i^ believed to form the basis for 
a number of antiviral therapies Q, y, |^ - 

Because the quasispecies equations were originally de- 
veloped to deal with single-stranded RNA genomes, 
the model implicitly assumed a conservative replica- 
tion mechanism, where the original genome is preserved. 
However, in order to apply the quasispecies model to 
living systems, whose genomes are DNA-based, it was 
necessary to develop the quasispecies equations for semi- 
conservative replication. In semiconservative replication, 
a double-stranded genome unzips to form two strands, 
each of which is used as a template for the formation of 
two new complementary strands by the rules of Watson- 
Crick base pairing Igj . The original genome is destroyed 
by this process, and because replication errors can hap- 
pen in both daughter strand syntheses, it is possible that 
the two daughter genomes will differ from the parent. 

Daughter strand synthesis from the parent template 
strand is not error-free. Therefore, living systems have 
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evolved a host of mechanisms which correct base-pair 
mismatches during replication (some of these mecha- 
nisms are built into the DNA replicases themselves. Oth- 
ers, such as mismatch repair, occur immediately following 
daughter strand synthesis) ^. Nevertheless, after repli- 
cation has occured, the daughter genomes may still con- 
tain mismatched base-pairs. These mismatches result in 
lesions along the DNA chain, which are repaired by DNA 
repair and maintenance enzymes present in the cell. Un- 
like repair that occurs during daughter strand synthesis, 
during lesion repair the parent and daughter strands are 
indistinguishable, and hence correct repair occurs with a 
probability of 1/2. 

The semiconservative quasispecies equations were de- 
rived in |2j , under the simplifying asssumption that post- 
replication lesion repair is perfectly efficient. This Let- 
ter provides an extension of the original semiconservative 
quasispecies equations, to account for the case when le- 
sion repair is imperfect (the full details of the solution 
presented in this work may be found in [ifll). Such an 
extension is necessary for a proper modeling of many im- 
portant biological processes. Indeed, imperfect lesion re- 
pair was first studied in Jll| in the context of modeling 
cancer. It may also be important for properly model- 
ing assymetric stem cell kinetics (the so-called "immortal 
strand" hypothesis) |12| . 

When lesion repair is perfectly efficient, double- 
stranded DNA consists of two complementary, antipar- 
allel strands [a Q- Each DNA genome is defined by 
the pair of strands {ct, ct} ~ {a, a}, where a denotes 
the complement of cr. If each base is drawn from an 
alphabet of size S (where 5 = 4 for known terrestrial 
life) , and if bi denotes the complement of a base hi , then 
if cr = bi . . .b^, we have, by the antiparallel nature of 
DNA, that a = bL...bi. 

The replication of a DNA genome {a, a} may be di- 
vided into three stages: (1) Strand separation, where the 
genome unzips to produce two parent strands, a and a. 
(2) Daughter strand synthesis, where each parent strand 
serves as the template for the synthesis of a complemen- 
tary daughter strand. (3) Lesion repair after cell division. 



This replication mechanism leads to the semiconservative 
quasispecies equations developed in [9j. 

When lesion repair is imperfect, the correlation be- 
tween the two strands is broken, and we must consider 
a more generalized dynamics over genomes of the form 
{(T, ct'}, where both a and a' are arbitrary. Following the 
derivation in |9j, we obtain the quasispecies equations 



dx 



{a,a'} 



dt 



-{K{a,a'} + '«(i))a;{<T,£T'} 

+ X! ''^{'^",'r"'}^i<T",a"'} X 

{<T",<T"'} 

b((a", a'"), {a, a'}})+ p{{a"' , a"), {a, a'}})] 

(1) 



where p((a", cr"'), {ct, cr'}) denotes the probability that 
strand a" , as part of genome {cr", c"'}, becomes genome 
{ct, ct'} after daughter strand synthesis and lesion repair. 
Here Xjo-.o-'} denotes the fraction of the population with 
genome {ct,ct'}, and R(t) = I]{<t,o-'} '«{cr,o-'}a;{<T,o-'} is the 
mean fitness of the population. 

In the semiconservative quasispecies equations, the 
complementarity property allows one to convert the qua- 
sispecies dynamics over double-stranded genomes into an 
equivalent (and considerably simpler) dynamics over sin- 
gle strands f9|. With imperfect lesion repair, the lack 
of perfect correlation between the two strands in the 
genome makes a conversion to a single strand model 
impossible. Nevertheless, we can make an analogous 
transformation of the dynamics, from double-stranded 
genomes {cr, cr'} to orderered pairs of strands, (cr, cr'), as 
follows: We define y{a,a') = Vic', a) = \x{a^„>} if cr 7^ cr', 
and 2/(^^^) = X{„^^}. Also, we define K(a,a') = i^(a' m) = 
>^{rj.a>}- Finally, we define p{{a" ,a"'),{(j,a')) to be the 
probability that a" , as part of genome {<j" , cr'"}^ becomes 
a, with daughter strand a' (after daughter strand syn- 
thesis and lesion repair). Then it follows that 



p((a",a"'),{a,a'}) = 



p((a",a"'),(a,a'))+ 
p((a",a"'),(a',a)) li a ^ a' 
p((a",a"'),(a,a')) li a ^ a' 

(2) 



Using these definitions, it is possible to convert the qua- 
sispecies equations over the space of double-stranded 
genomes to the space of ordered sequence pairs. After 
some manipulation, the final result is. 



dy(a^a> 

dt 



= -^{l^{<J,a') + l<'{t))y(a.,y') 

+ 2^ l^(a",a"')y(a",cr"') X 

(<T",<T"') 

[p((cr", a'"), (fj, a')) + p{{(t", cr'"), {(j\ a))]. 



(3) 



To determine p((ct", cr'"), (cr, cr')), we introduce some 
additional definitions. Define ac to be the subsequence 
of bases in a which are complementary with the corre- 
sponding bases in a'. That is, suppose a — 61... 61,, 
and suppose for indices ii < 12 < ■ ■ -ik we have that 



b', 



L-ij+l- 



Then ac 



We also define 



a'(j to be the subsequence of corresponding bases in a' , 
so that a'(j = b'j^_^ _|_i . . . ^'l-ji+i- Finally, let a'^ denote 
the subsequence of bases in a" corresponding to the bases 
in <Jc, so that a'^. 



K-- 



■K- 



Now, define (Tisic to be the subsequence of bases in 
a which are not complementary with the corresponding 
bases in a' . That is, given the complementary indices 
ii < 12 < ■ ■ ■ < ik defined above, let i'^ < i'2 < ■ ■ ■ < i']^_^. 
be the remaining indices. Then (Jnc = ^i' ■ ■ • ^j' • We 
define a'p^fj to be the subsequence of corresponding bases 



'NC 

in <t' so that a 



NC 



,.-1-1 



5L-i'i + l- 



Finally, we 



let ct'Ijq denote the subsequence of bases in a" corre- 
sponding to the bases in aNC^ so that a'^ 



NC 



We also assume that daughter strand synthesis during 
replication of the genome {a" , a'"} is characterized by a 
per base-pair mismatch probability of e{CT",(T"'}j and we 
define tt^„n y,,^ = e(^cr"'.cr") ~ ^{cr",a"'}- Finally, we define 
A to be the probability that a post-replicative lesion is 
repaired. This gives, 



£(<t",o-"') \DH(<Tn.crc)( 



p((a",cT'"),(a,a'))=<5..^.„,(^i^^)^-('^o-c)(i_,(^„^^„,j(l__)) 
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For A = 1 (all lesions are repaired) , our equations reduce 
to the ordinary semiconservative quasispecies equations 
i 

In the simplest case, we assume that £{cr,(7'} is genome- 
independent, and hence may be denoted by e. We also 



define /i — Le, and consider the quasispecies dynam- 
ics at fixed /i in the limit of i — > 00. Note that 
\iraL^oo,Le=fj. (1 — e)^ = e^'', so fixing /Li is equivalent to 
holding the correct daughter strand synthesis probability 
constant in the limit of infinite sequence length. 



We now consider a generalized "single-fitness peak" 
landscape, characterized by a "master" genome {ctq, (Tq}. 
A given genome {a, a'} is viable, with a first-order growth 
rate constant fc > 1, if it is equal to the master genome, 
differing by at most I lesions. Otherwise, the genome is 
unviable, with a growth rate constant of 1. 

In the limit of infinite sequence length, it may be shown 
that, with probability one, the Hamming distance be- 
tween (To and CTo is infinite [2|. Therefore, we may re- 
gard (aojO'o) and {ao,ao) as infinitely separated in the 
sequence-pair space, and so, by an appropriate trans- 
formation of Eq. (3), we may consider the local dy- 
namics about each sequence pair independently of the 
other. Thus, we consider the dynamics of the X(^a-a-') 
for two types of (ct, cr'): First, we consider (cr, cr') such 
that Z)/f(cr, CTo), -Dff(CT',CTo) are finite, and second, we 
consider (ct, ct') such that /^^(ct, cto), DH{cr',ao) are fi- 
nite. If (ct, ct') belongs to the first type of sequence pairs, 
then it is clear that (ct',ct) belongs to the second type. 
The symmetry of the landscape means that the dynam- 
ics about one sequence pair completely determines the 
dynamics about the other. 

A given sequence pair (ct, ct') of the first type can be 
characterized by the four parameters Ic, II, Ir, and Is- 
The first parameter, Ic, denotes the number of positions 
where ct, ct' are complementary, yet differ from the cor- 
responding positions in ctq, ctq, respectively. The second 
parameter, II, denotes the number of positions where ct 
differs from cto, but the complementary positions in ct' 



are equal to the corresponding ones in ctq . The third pa- 
rameter. In, denotes the number of positions where ct is 
equal to the ones in ctq, but the complementary positions 
in ct' differ from the corresponding ones in ctq. Finally, 
the fourth parameter, Is, denotes the number of positions 
where ct, ct' are not complementary, and also differ from 
the corresponding positions in ctq and cto, respectively. 

For our generalized single-fitness peak model, the fit- 
ness of a given sequence pair (ct, ct') of the first type is 
determined by Ic, II, Ir, Ib, hence we may write Ki^.a') = 

^k>liilc = 0, 



i^(Ic,Il,Ir,Ib)- Specifically, h{Ic,Il,Ir,Ib) 

and if Il + Ir + Ib < I- Otherwise, Ks^i^^i^^i^^i^) 



= 1. 



We define z, 



(Ic^l,Ir,Ib) 



to be the total fraction of the 



population whose genomes are characterized by the pa- 
rameters Ic, II, Ir, Ib- Note that we can consider these 
same parameters as characterizing genomes of the second 
type (i.e., defined by the ordered pair (ctq, cto)), and con- 
sider the corresponding population fraction Zfj^i^^ij^i^y 
It should be clear, that, by symmetry, z^i^^i^^i^^i^-^ — 

^(Ic.Ir,IlJb)- 

Because the fitness is only determined by /c, II, Ir, 
and Ib, it follows that we may presymmetrize our popu- 
lation and reexpress the quasispecies dynamics in terms 



of the Z(^i^j^ 



,1b)- 



In 



equ 



we show that the neglect of 
backmutations in the limit of infinite sequence length im- 
plies that we may set -Z(ic,ii,ii?.,ii3) = ^ when Ib 7^ 0, and 
when Il,Ir. are simultaneously nonzero. Therefore, the 
relevant equations are 



dz, 



( ^C, 0:0,0) 

dt 



dz, 



il c,lL,Qfi) 
dt 



dZ(lc,Q,lR,Q) 

dt 



-('«(/c,o,o,o) +'«(i))^(;c:0,o. 



0) 






I' \^ 2 ' ^—' ^—' ■"'■'1 '"^'"'-c" 



(5) 



-i'^(lc,lL,0,0) + '^i't))zilcdL,0,0) 



-l(Ml-A))'^e-(-V.)5:-l(^)'o J2 E -«'.c-.^-./.^'.o)^(,' 



II 



I' I" 2 



,lc ~^'n~^'i j^2 ■'^) 



(6) 



'(^{Ic,0,Ir,0) + l^{t))z(lcfi,lR,0) 






/' =0 



0) 



i'/=0 i'^=o 



(7) 



The above equations may be used to solve for the equi- 
librium mean fitness K{t = 00) for the generalized single- 
fitness-peak landscape. Below the error catastrophe, the 



result is 



K{t — 00) 



A{fi, A) + v/A(a*, A)2 + 4i?(A., A) 



(8) 



where A(/x,A) == fc((l + //(/i, A))e-''(i-V2) _ i) _ 
^(g-MA/2 ^ g-p(i-A/2) _ j^)^ where we define fl{^i,\) = 

The error catastrophe occurs when the mean equihb- 
rium fitness determined by Eq. (8) becomes equal to 
the growth rate of the unviable genomes. At this point, 
the selective advantage of the viable genomes is no longer 
sufficiently strong to localize the population, and dclocal- 
ization occurs over the entire genome space. The critical 
H is therefore found by setting K{t = oo) = 1 in Eq. (8), 
and solving for /i. The resulting expression is 
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-m(1-A/2) 



1 



A:(2 + /i(/x,A))-/z(/x,A) 



(9) 



It is instructive to study the behavior of K{t — cxd) for 
specific landscapes and values of A. First of all, note that 
fiifi, 1) — 1, giving R{t — oo) — fc(2e^^/^ — 1), which is 
exactly the expected semiconservative result with perfect 
lesion repair Q. Also, note that foo{fJ-,X) = e^^^'^^^, 
which gives K{t = oo) = ^(e"''^^"^/^) + e"'^'^/^ - 1). 
For A = 1, we of course recover the semiconservative 
result. However, for A = 0, we obtain R,{t — oo) = fce~'^, 
which is exactly the result expected from conservative 
replication. Therefore, when only one perfect strand in 
a double stranded genome is necessary for the organism 
to remain viable, we recover an effectively conservatively 
replicating system in the absence of lesion repair (see also 



rep J 

Q 



In Figure 1 we show some results of stochastic sim- 
ulations of replicating genomes, which corroborate the 
analytical results obtained from our theory. 

The recent incorporation of semiconservative replica- 
tion into the quasispecies model was an important step 
toward modeling real systems that revealed a number 
of important dynamical signatures absent in the original 
model. However, the initial assumption used in previ- 
ous semiconservative works, namely that post-replication 
DNA repair is perfect, is clearly an oversimplification 
that is particularly poor for some of the most scientif- 
ically interesting systems such as cancer and stem cells 
|lCt llll Il2l| . This approximation introduces a false sym- 
metry that can drastically alter the evolutionary behavior 
and equilibria. By providing a full treatment of semicon- 
servative quasispecies dynamics with partially activated 
lesion repair, we have taken a significant step forward in 
the modeling of genomic evolution. 

This research was supported by the National Institutes 
of Health. The authors would like to thank Prof. James 
L. Sherley and Franziska Michor for useful discussions. 
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FIG. 1: Plots of K(i = cxd) versus [i,, from both stochastic 
simulation and theory. We took I = cx). For our stochas- 
tic simulations, we averaged our results over 10 runs, using 
sequence lengths of 20, and a population size of 1, 000 organ- 
isms. 
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